A BAYESIAN APPROACH TO ASSIST IN 
THE DIAGNOSIS OF CORONARY HEART DISEASE 



William Randol ph Condos 



Library. 

Naval Postgraduate School 
Monterey, California 93940 






A BAYESIAN APPROACH TO ASSIST IN 
THE DIAGNOSIS OF CORONARY HEART DISEASE 

by 

William Randolph Condos, Jr. 
and 

Everett William Knox 

Thesis Advisor: M. U. Thomas 



March 1973 







Appn.ovzd {,0A. public dvitu^ibation untimLtzd. 






A Bayesian Approach to Assist in 
the Diagnosis o£ Coronary Heart Disease 



by 



William Randolph^Condos , Jr. 
Captain, United States Army 
B.S., United States Military 
Academy, 1967 



Everett William Knox 
Captain, United States Army 
B.E.E., Clarkson College o£ 
Technology, 1963 



Submitted in partial £ul£illment o£ the 
requirements £or the degree o£ 



MASTER OF SCIENCE IN OPERATIONS RESEARCH 



£rom the 

NAVAL POSTGRADUATE SCHOOL 
March 1973 



!A P 











Library 

Naval Postgraduate School 
Monterey, California 93940 



ABSTRACT 

The objectives of this thesis were to design a method 
for evaluation of the diagnostic potential of available 
indicators of coronary heart disease (CHD) and to present a 
systematic, quantitative procedure for aiding in its diag- 
nosis. A sample space of patients was divided into two 
mutually exclusive groups, those with angiographic evidence 
of CHD, and those with no CHD. Active duty or retired 
military men between the ages of 30 and 67 years constituted 
the sample space. Tests and risk factors were available in 
the medical literature that a doctor could view as an indi- 
cator or contraindicator of CHD. A vector of these possible 
indicators was established and the diseased group was com- 
pared to the non-diseased group in an effort to evaluate 
the diagnostic potential of the indicators. This was don 
by discriminant analysis in conjunction with a Bayesian 
method of weighting the importance of test results. The 
important indicators were then used to formulate a model for 
diagnosing CHD based on a Bayes' decision technique. 



2 



TABLE OF CONTENTS 



I. INTRODUCTION AND SCOPE 4 

II. BACKGROUND 6 

III, • DESCRIPTIVE MODEL 8 

IV. QUANTITATIVE METHODS 11 

A. INDICATORS AND WEIGHTING FACTORS 11 

B. BAYESIAN DIAGNOSTIC MODEL 15 

V. CLINICAL TESTS AND OBSERVATIONS 19 

VI. SENSITIVITY 22 

VII. RESULTS AND CONCLUSIONS 27 

VIII. AREAS FOR FUTURE STUDY 30 

APPENDIX A Sample Data Collection Sheet 32 

APPENDIX B Bayes' Diagnostic Model ' Fortran Flow 33 

Chart 

APPENDIX C Bayes' Diagnostic Model Fortran Program 

Listing ..... 34 

APPENDIX D Sample Computer Output 41 

LIST OF REFERENCES 42 

INITIAL DISTRIBUTION LIST 44 

FORM DD 1473 46 



3 



I. INTRODUCTION 



Heart attacks resulting from coronary heart disease 
(CHD) cause more deaths each year than cancer, strokes, and 
accidents combined. These deaths also include a broader 
spectrum of the population than in previous years. In the 
last century, heart disease was viewed as a natural result 
of growing old. But with the transition from a rural to 
an urban society, and the inherent traits of tension, rich 
diet, and lack of exercise, the propensity for heart disease 
has increased. This increase can be seen in the steady 
rise in the number of heart attacks among men over the past 
20 years. The American Heart Association reported that of 
the 675,000 deaths from CHD expected during the past year, 
176,000 would have been men and women under the age of 65 
[Ref. 16]. 

Medical capabilities have greatly increased, giving 
coronary heart disease patients a greater probability of 
survival once they are under medical care, but since over 
half of those who die never reach a hospital, the problem 
of predicting coronary heart disease becomes very important. 
This diagnostic problem gains additional importance because 
of the lack of a proven method for the treatment of CHD in 
its advanced stages. Furthermore, there is an increased 
presence of asymptomatic CHD that may go undetected with 
present diagnostic criteria. 



4 



In this study an attempt has been made to consolidate 
a spectrum of risk factors that can be incorporated into 
diagnostic procedures for CHD. Specifically, the objectives 
were to design a method for evaluation of the diagnostic 
potential of available indicators of CHD and to present a 
systematic, quantitative procedure for aiding in its diagnosis. 

A sample space of patients was divided into two mutually 
exclusive groups, those with angiographic evidence of CHD, 
and those with no CHD. Active duty or retired military men 
between the ages of 30 and 67 years constituted the sample 
space. There were certain tests and risk factors available 
in the medical literature that a doctor could view as an 
indicator or contraindicator of the disease. Having 
established a vector of these possible indicators, the 
diseased group was compared to the nondiseased group in an 
effort to evaluate the diagnostic potential of the indicators. 
This was done by discriminant analysis in conjunction with 
a Bayesian method of weighting the importance of test 
results. The important indicators were then used to 
formulate a model for diagnosing CHD based on a Bayes* 
decision technique. 



5 



II. BACKGROUND 



Probabilistic and computer aided designs to aid decision 
makers in medical diagnosis have been a promising area of 
research for some time, and an abundant literature on these 
subjects exists [Refs. 8, 10]. They have had little impact 
on the practice of medicine, hoivever, with several charac- 
teristic reasons being given. Among them may be mentioned 
insufficient data bases because of the poor quality, lack 
of uniformity, or inaccessability of medical records. In 
addition, there appears to be a lack of understanding and 
interface between the medical profession and those who 
would apply probabilistic procedures to aid the medical 
decision makers. 

Recent years have shown an increase in research efforts 
aimed at the prevention and diagnosis of CHD. At the 
present time, however, coronary arteriography appears to 
be the only completely definitive test for the disease 
[Refs. 4, 12]. Unfortunately, this is a costly surgical 
procedure that requires hospitalization and involves 
definite mortality and morbidity factors, depending on the 
age and health of the patient. Arteriography is currently 
only available at large medical centers because of the 
equipment and expertise required. 

Some diagnostic models for CHD tend to consider only 
symptomatic patients, usually those with typical angina. 



6 



This omits many subjects who are asymptomatic, a portion of 
which may be suffering from silent heart disease. 

The medical literature cites commonly accepted indica- 
tors for CHD. Widely used indicators cited are history of 
ischemic episodes, age, total cholesterol, triglycerides, 
resting EKG, smoking, and family history [Refs. 4, 12, 16]. 
Less commonly used indicators that are also cited are race, 
blood type, and blood pressure [Refs. 4, 9]. In addition, 
the exercise test has recently gained widespread acceptance 
as a good CHD indicator [Refs. 1, 6]. The relative impor- 
tance of this test in conjunction with other indicators 
has not yet been thoroughly investigated. 

It seems appropriate that a diagnostic model for 
predicting CHD should investigate the potential of an 
exhaustive list of indicators and tests for the disease. 
This diagnostic model should also reduce the subjectivity 
in the decision making of the doctor by increasing the 
amount of objective evidence through the appropriate indi- 
cators and tests. 



7 



III. DESCRIPTIVE MODEL 



The flow of patients to a cardiac clinic is similar 
to the input of any other specialty clinic. A patient may 
be referred to the cardiologist by another doctor based 
on the results of a physical examination or, if a person 
believes that he is suffering from a cardiac or cardiac- 
related illness, he may voluntarily seek the advice of the 
specialist directly. In either case, by the time a patient 
is admitted to the cardiologist's office, there is already 
certain data on him that is available to the physician 
without specified testing. From that point on, however, 
the diagnosis of a possible heart disease is a function of 
the doctor's ability to assign relative importance to the 
appropriate indicators. Costs of associated testing, the 
procedures available, the patient, and the patient's health 
may also have a bearing on the doctor's ability to diagnose 
correctly. 

The cardiologist then may be viewed as a decision maker 
who, for each patient, receives an amount of initial infor- 
mation I^ from which he initiates a sequence of decisions, 
gaining additional information I^' as a result of testing. 
Figure 1 shows a schematic of these decision processes. 



8 



FIGURE 1 



DECISION PROCESSES OF CARDIOLOGIST 



I ’ 




Outcomes 



As an illustration of the concepts implied in Figure 1, 

consider that a patient is referred to the cardiologist 

because he has symptoms of CHD. At decision node D^ the 

doctor evaluates the information he has available. Usually 

this is information readily available in the patient’s 

medical record. Based on this information, the doctor has 

two choices at D^, diagnosis of the patient or requesting 

additional testing,. If, for example, the doctor chooses t > 

perform a test, decision node D^ represents the choice the 

doctor must make from the clinical tests available. Having 

made the choice, I^’ represents the information that results 

from the outcome of the test. The doctor is again faced 

with the decision to be made at D , but he now has the new 

o 

information I * which reduces the chance of an incorrect 
o 

diagnosis . 

A summary is presented in Table 1 that shows the possible 
path of a patient through a diagnostic sequence. 



9 



TABLE 1 



PATIENT ADMITTED TO THE CARDIOLOGIST 



Race, Sex, Age, Height, 
Weight, Blood Pressure, 

Blood Type, Family History 
o£ Heart Disease, Smoking 
History, History o£ Ischemic 
Episodes 



AVAILABLE 
INFORMATION (A) 




FURTHER TESTING SPECIFIED BY CARDIOLOGIST 



Resting EKG 
Exercise EKG 



CLINICAL 

TESTS 



Triglycerides 



(B) 



Cholesterol 

Angiogram 



(I ’) 



This summary does not dictate a speci£ied sequence o£ tests 
or weightings o£ relative importance. The in£ormation in 
(A) is data available (£acts about the patient) that are 
easily obtained without testing. Tests in (B) require 
expert judgment or- clinical procedures and, again, are not 
ordered in any sequence o£ importance. In practice, not 
all o£ the listed indicators are used £or decision making. 
Some may be considered by a particular doctor to be unim- 
portant. It is also di££icult to assign subjective proba- 
bilities to some o£ the indicators about which little is 
known. Furthermore, it is impractical to correlate the 

f 

contributions o£ a large number o£ indicators without some 
' type,.of cb j ective model. 




10 



IV. QUANTITATIVE METHODS 



In general, there are two approaches to medical decision 
problems. The first is to develop and perfect a model that 
predicts as well as or better than a physician. The second 
approach consists of improving ways to aggregate, weight, 
and use information available to the physician so that his 
personal diagnosis will be conducted from a substantially 
sounder base. This latter approach, which is commonly 
called "bootstrapping” [Ref. 10] was the one selected for 
this study. 

A set of CHD indicators was identified and evaluated 
experimentally using discriminant analysis. A proposed 
method of assigning weighting factors based on the "posterior 
odds" of the various indicator levels was incorporated into 
the analysis. These results were then integrated into a 
Bayesian diagnostic model. 

A. INDICATORS AND WEIGHTING FACTORS 

At decision node D^ of Figure 1, the doctor must decide 
what test to use next in his evaluation of the patient. To 
do this he must have a knowledge of what indicators of CHD 
have been evaluated and the amount of additional information, 
I^', he can expect to obtain from these indicators. Compli- 
cating the doctor's evaluation is the division of the 
indicators into two types, qualitative and quantitative. 

The quantitative indicators are tests in which the outcome 



11 



is represented on an acceptable numerical scale. Of the 
indicators used in this paper, only triglycerides, choles- 
terol, age, and blood pressure were quantitative variables. 
The other indicators shown in Table 1 (except height and 
weight which were not used) have results which have no 
numerical scale and must be interpreted qualitatively. 

For example, the indicator called history of ischemic 
episodes requires the patient to verbalize his history of 
chest pain. Also included in the category of qualitative 
indicators are tests in which the result is numerical but 
lacks meaning unless expressed in qualitative terms. The 
exercise EKG result, for example, is in millimeters of 
depression (or elevation) of the S-T segment, but is inter- 
preted in terms of being positive or negative. 

As pointed out previously, these indicators and their 
relative merit were determined from clinical judgment and 
varied among cardiologists. In addition, the relative impor- 
tance of various outcomes of any specific test also varied 
among doctors. To alleviate these problems, a two-step 
procedure was used. First, the outcomes of the qualitative 
tests were assigned weighting factors using Bayes* Theorem. 
Second, the qualitative variables and quantitative variables 
were integrated into a relative ranking using a stepwise 
discriminant analysis computer routine [Ref. 11]. 

Consider a particular qualitative variable i for which 
P(tij|D) is the conditional probability of outcome j 



12 



given a patient has CHD. The posterior probability of CHD 
(i.e., in light of this information) is 



PCDit.p 



P(t.. IDJPCD) 

P(t. JD)P(D) t P(t..|D)P(D) 

J J 



( 1 ) 



where P(D) is the presumably known prior probability of CHD. 
Each of these probabilities on the right hand side of 
equation (1) can be estimated from past data. The results 
are a vector of values for the outcomes of a specific test 
which could then be used with the outcomes of other tests 
in a stepwise discriminant analysis computer routine. How- 
ever, in order to give more meaning to the weighting fac- 
tors, w. . , they were normalized using 



w. . 



P(D|t-j) 

min {PCD|t-.)} 
k 



( 2 . 



where it was arbitrarily decided to use the minimum outcome 
in order to show increasing likelihood of disease as the 
value of the weighting factor increased. 

Consider the following simple example to illustrate the 
procedure for computing weighting factors. Suppose it is 
desirable to find weighting factors for the qualitative 
variable "race” (i = R) which for the purpose of illustration, 
has two outcomes: NEGRO (j = 1) and CAUCASIAN (j = 2). 

Suppose further that the prior distribution of CHD is 



13 



P(D) = 0.1 and data reveals that P(tj^^|D) = 0.2 and 
P(tRi|D) = 0.4. It then follows from equations (1) and (2) 
that the weighting factors are Wr^^ = 1.0 and Wr^ = 2.46. 

This method of computing the weighting factors {w^^. :i = l, 

. . . ,n; j=l , . . . ,m} provides a consistent means of assigning 
scores to each of the qualitative variables. This was done 
for a particular set of indicators examined in this study 
and the results are given in Section VI. Stepwise linear 
discriminant analysis [Ref. 11] could, at this point, be 
used to develop a linear prediction function L = g \ X 

i=l i i 

where X is the set of all test variables (quantitative and 
qualitative) , X is the set of all coefficients assigned by 
the computer routine, and m is the number of tests. Maha- 
lanobis distance could then be used as the discrimination 
criterion. 

Cohn [Ref. 4] used this type of linear discriminant 
analysis in its predictive role in a medical decision con- 
text. Use of discriminant analysis for prediction was 
discarded in this paper for two reasons. The technique is 
a valid one when the underlying distributions of the random 
variables of the two samples (in this case, the test results) 
are distributed normally with equal covariance matrices (a 
linearity assumption) . A preliminary investigation indicated 
that the variance of the test results in the two samples did 
not appear to be equal. Additionally, the normality 



14 








I 



assumption did not appear to be valid in this application. 

The test results had a combination of binomial, multi- 
nomial, and approximately normal distributions. Considera- 
tion of all distributions as normal did not have a sound 
theoretical basis. 

The actual purpose of conducting this portion of the 
analysis was to identify the relative importance among the 
variables. This was accomplished by ordering the resulting 
F-statistics associated with the coefficients (X's) of the 
variables (X*s). The F-statistic is the ratio of the vari- 
ability of the means of the individual test results in 
each sample to the pooled variance of the test results. F 
will be large when there is a large difference between the 
mean results of a test in the CHD and the no CHD groups. 
Likewise, the smaller the F, the closer together are the 
mean results for a particular test in the CHD and no CHD 
groups. Thus, an ordering of these computed F-statistics 
from largest to smallest may be considered an ordinal 
ranking of the diagnostic power of the various indicators. 

B. BAYESIAN DIAGNOSTIC MODEL 

The foregoing procedure, of Section IV. A. , for determining 
the relative diagnostic power of the available tests of 
indicators provides criteria for the cardiologist to select 
appropriate tests at decision node D^ in Figure 1. A 
Bayesian method for quantifying the information and 
additional informa'tion I * is now presentedr 



15 



The development of this model was based on two major 
assumptions. First, it was assumed that patients being 



tested either had CHD or did not have CHD. Thus, the case 
of a patient having multiple diseases was excluded here. 

The second assumption was that the data, on both qualita- 
tive and quantitative variables were conditionally inde- 
pendent . 

Let 

H apriori probability of CHD (D^^) , or no CHD (^2=0^^). 

P (D^ I , . . . , S^) a posterior probability of D^^^ given 
symptoms, or indicator levels, S^,...,S^. 

P , . . . ,S^ I D^) = conditional probability of symptoms 
Sf , . . . ,Sn given D^^^ . 

The first assumption merely requires that P (D^^ j =P (D^^) 
or P(D 2 ) = P(D^). The second assumption, in terms of the 
above notation, says that 



n 




( 3 ) 



It then follows from Bayes' Theorem that 



n 



P(D. 1S^,...,S^) 



P(D.) n P(S.|D.) 

1 j=l ^ 1 ' 



2 




16 



which is in terms that can be calculated using subjective 
probabilities (doctor’s medical opinions) and frequentistic 
procedures [Ref. 8] . 

The majority of the conditional probabilities were 

calculated using frequentistic procedures. Subjective 

probabilities were used when the data base was insufficient. 

In cases where a patient was missing the j^^ symptom on his 

medical records or the patient was unable to take the test, 

the conditional probabilities P(S.|D-,) and P(S. [D-) were 

J ± J z 

set equal to .5 (i.e., P(S.|D^) and P(S.|D_) were equally 
likely and thus had no influence on the associated proba- 
bilities) . 

The Bayesian diagnostic model was developed because it 
provided several distinct advantages over general discrimi- 
nant analysis techniques commonly used for medical decision 
making. The first advantage was the use of subjective 
apriori probabilities. Each doctor has his own feelings 
and experience concerning the probability of CHD in a patient. 
The second advantage was that the Bayesian model is self- 
updating. After each patient has been diagnosed, his charac- 
teristics can be easily added to the data providing new 
apriori probabilities. This allows the doctor to see trends 
that may develop, providing the stimulus for research in 
these areas. The data base is continuously enlarged in this 
manner, improving the diagnostic accuracy of the model. The 
third advantage is that CHD is only a small part of the 



17 



diagnostic problem facing the doctor. The Bayesian approach 
allows for the expansion of the hypothesis. In the present 
model only one hypothesis is treated, no CHD or CHD. How- 
ever, this could easily be expanded to no disease, CHD, 
liver disease, etc. An important aspect of this is that as 
the number of data points in the data vector and the number 
of hypotheses are increased, the accuracy of the model 
improves . 



I 



18 



V. CLINICAL TESTS AND OBSERVATIONS 



Data was derived from three sources. The first source 
was generated by testing a sample of individuals undergoing 
routine physical examinations at Fort Ord Army Hospital. 

A collection sheet was developed to record the data that 
was simple yet comprehensive enough to see if trends 
developed in areas not considered important in the initial 
analysis (See Appendix A) . 

The second source of data was the medical records at 
Letterman General Hospital, San Francisco. A data sheet 
similar to that of the Fort Ord sample was used. However, 
several problem areas were encountered. The first was the 
problem of definition and interpretation. Many records 
showed information such as "positive” family history with 
no explanation of what the doctor’s opinion was based on. 
Others had entries such as "30 pack year history" of 
smoking. This type of data does not differentiate between 
two packs per day for 15 years or three packs per day for 
10 years. Since intensity of smoking may be an important 
variable, much valuable data were lost. Another problem in 
this area was the omission of data that were assumed to be 
normal. If a patient's test result was abnormal, the 
result was noted in the patient’s record. (However, if 
nothing was noted, it was not clear whether the test result 
was normal or that the result was omitted.) It is clear 



19 



that personalities become an important factor in the writing 
and in the reading of medical records. However, it is felt 
that as more records are automated these problems will be 
greatly reduced. 

The problem of missing data was the major obstacle 
encountered from the CHD population. The majority of the 
patients did not have all the test results in their files. 
The only solution to this problem is to increase the sample 
size so that patients with missing data can be removed from 
the sample. But since one of the major objectives of this 
paper was to develop a method, the missing data problem will 
not be considered within this framework. For information 
concerning decision making with missing data, see Ref. 4. 

The third source of data was the medical literature. 

This was used to establish apriori probabilities of CHD 
when it was felt that the experimental sample was too small, 
making the sample probabilities very sensitive to error 
[Ref. 7]. 

The partitioning of the sample space into two parts, 

CHD and no CHD, implied that the subject in the healthy 
group was not suffering from any disease, and that a sub- 
ject in the CHD group was suffering from CHD only. Other 
diseases may have adversely affected the test results of 
either group. In the formation of the sample, care was 
taken to eliminate all subjects that had other diseases. 

In the determination of positive or negative family 
history, the age of 65 was considered the cut-off. If a 



20 



blood relative had CHD prior to age 65, the result was 
positive. Although this cut-off was arbitrary, it was the 
one most consistent with the available literature. It 
can be easily changed, however, if another cut-off is 
des ired. 

When checking for chest pain, the existence of any 
chest pain that was not categorized as angina was listed 
as undetermined origin since none of the subjects were 
known to have diseases which might explain the pain. 

The reading of the resting EKG was done by a cardiologist 
whose experience and subjective opinions must be considered 
an important part of the data. 



21 



IV. SENSITIVITY 



Sensitivity analysis was conducted in the following 
areas : 

1. The effect of weighting factors on the ordinal 
ranking of the qualitative indicators was investigated. 
Table 2 shows how changes in weighting factors proved to 
markedly influence the diagnostic ordering of the indica- 
tors shown in Table 3. 



TABLE 2 



Test 



Sample Clinical 
Bayes' Judgment 

Weighting Factor Weighting Factor 



Blood Type 

A 1.3 2 

Other 1 1 

Family History 

Positive 1.3 2 

Negative 11 

Smoking History (per day) 

Non-smokers 1 1 

Less than 1/2 pack 4.6 2 

About 1 pack 4.5 3 

Greater than 1 pack 6.3 4 

History of Ischemic Episodes 

1 1 

8 2 



None 

Chest pain 
Typical angina 



22 



335 



3 



TABLE 2 (Continued) 



Resting EKG 
Normal 
Other 

ST-T abnormalities 
Pathologic Q-waves 

Race 

Caucasian 
Negro 
Mongolian 
Exercise EKG 
Normal 

ST depression < 1mm 
ST depression ^ Imra 



1 1 

4 2 

20 3 

22.5 4 

6.5 1 

1 2 

1.5 3 

1 1 

25 . 2 

150 3 



All other indicators were quantitative. The follov\fing 
ordering of indicators and their associated F-statistics 
resulted (Table 3) : 



TABLE 3 

Bayesian Weighting Procedure 



History of Ischemic Episodes 


97.5709 


Exercise EKG 


5.3225 


Age 


3.2315 


Resting EKG 


2.6744 



23 



TABLE 3 (Continued) 



Blood Type 2.6532 
Cholesterol 2.2091 
Density 2.0640 
Cigarette Smoking 1.8119 
Systolic Blood Pressure .9659 
Family History .1607 
Triglycerides .0485 



Diastolic Blood Pressure and Race were omitted because of 
an insignificant F value for this particular sample. 

Sample Clinical Weighting Procedure 



Exercise EKG 13.0179 
History of Ischemic Episodes 7.0552 
Density 2.3373 
Blood Type 2 . 2919 
Cholesterol 2.1914 
Systolic Blood' Pressure 2.1383 
Resting EKG 1.7011 
Cigarette Smoking 1.5936 
Family History 1.1054 
Diastolic Blood Pressure .2963 
Triglycerides .2573 
Age .0243 
Race .0169 



24 



2. Diagnostic accuracy was investigated by varying the 

prior probability of disease, P(D), and assuming P(D|S^,... 

S ) > 0.5 indicated CHD. These values for Table 4 were 
n-^ 

determined from patients having eight or more test results. 



TABLE 4 



P (D) False Negatives* ** False Positives*** 



.04 

.10 

.20 

.30 

.40 

.50 



9/50 

6/50 

5/50 

5/50 

3/50 

2/50 



0/52 

0/52 

0/52 

1/52 

1/52 

1/52 



** False Negative = patient has CHD but is diagnosed as 
not having CHD. 

*** False Positive e patient does not have CHD but is 
diagnosed as having CHD. 



3. After the model had been developed and the condi- 
tional probabilities had been determined, data on CHD 
patients were obtained from Walter Reed Hospital. Using the 
originally determined probabilities, these patients were 
tested with the Bayes' diagnostic model and 12 out of 14 
were correctly diagnosed as having CHD. Again, a P(D|S^,..., 
> 0.5 indicated CHD. 

The Walter Reed patients were then added to the original 
sample to update the prior probability of disease. The 



25 



changes in the prior probabilities were so small that they 
had no effect on the diagnostic results. 

4. Diagnostic accuracy was investigated by varying 
that probability above which CHD would be indicated 
(Table 5) : 



TABLE 5 



P(D S^,...,S^) False Negatives False Positives 



.1 


3/58 


1/52 


.2 


4/58 


0/52 


.3 


8/58 


0/52 


.4 


9/58 


0/52 


.5 


9/58 


0/52 


.6 


9/58 


0/52 


.7 


9/58 


0/52 


.8 


11/58 


0/52 


.9 


15/58 


0/52 



26 



VII. RESULTS AND CONCLUSIONS 



As previously stated in Section I, the objectives of 
the study were to design a method for the evaluation of the 
diagnostic potential of available indicators of CHD and to 
present a systematic, quantitative procedure for aiding in 
its diagnosis. The indicators of CHD were investigated by 
comparing specific test results from a CHD sample and a 
healthy sample with no CHD. 

The stepwise discriminant analysis, as presented in 
Section IV. A. , using all variables was performed on a CHD 
sample size of 106 compared to a no CHD sample size of 56. 

The v/eighting factors were determined by the Bayesian 
approach (tabulated in Table 3, Section VI). An important 
result of the ■ discriminant analysis program was the ordering 
of variables and their associated F-statistics which may 
be viewed as an ordering of the relative diagnostic impor- 
tance of the tests- (see Table 4, Section VI). This method 
of assigning weighting factors to test results in conjunc- 
tion with discriminant analysis is a valid procedure for 
ordering the vector of tests in their diagnostic importance. 

It provides a means for a doctor at decision node D^ (of 
Figure 1) to determine which test provides the most additional 
information I^’ from those available to him. Additionally, 
the method is particularly valuable and easily adapted to 
considering new indicators of disease where no definitive 



27 



clinical judgment exists or doctors do not agree on the 
relative importance of test results. 

The Bayes' diagnostic model (Section IV. B.) was 
developed to provide a systematic, quantitative procedure 
for aiding in the diagnosis of CHD. It was evaluated by 
checking how well it diagnosed patients from a known CHD 
group and a knoim healthy group. The difficulty in obtain- 
ing patients with all the required test results was noted 
in Section V and resulted in extremely small samples with 
complete data to investigate. However, six out of seven 
of the CHD group were diagnosed correctly, and 33 out of 33 
of the no CHD group were diagnosed correctly. When only 
eight or more of the test results were available, the model 
diagnosed with 91% accuracy (41 out of 50 in the CHD group 
were diagnosed correctly and 52 out of 52 of the no CHD 
group were diagnosed correctly) . These results were based 
on using a posterior probability of disease of .50 as the 
cut-off probability (i.e., P (D | S^^ , . . . ,S^) ^ .50 indicated 
CHD) . The variation of the cut-off probability (see 
Section VI) demonstrated that the diagnostic accuracy 
of the model was greatly influenced by the choice of the 
cut-off criterion. For example, using a cut-off of .20 
instead of .50 reduced the number of false negatives from 
nine to four while the number of false positives remained 
the same. 

As a validation of the Bayes' diagnostic model, 14 
known CHD patients from Walter Reed Hospital were diagnosed 



28 



by the model. Twelve o£ the 14 were diagnosed correctly. 

The validation is not conclusive because o£ the extremely 
small sample tested, but it does indicate that the method 
is promising. 

It may be desirable to use the methods presented in 
a screening program to identi£y people with high risk o£ 

CHD £rom a large population. Su££icient doctors may not 
be available to examine all o£ the people to be tested. As 
an example o£ the model's applicability to such a screening 
program (where a doctor is not required) diagnostic accuracy 
was investigated using the results o£ the in£ormation avail- 
able only [re£erred to in Figure 1 as and in Table 1 
as (A)]. The model diagnosed with 921 accuracy (19 out o£ 

24 in the CHD group were diagnosed correctly and 44 out o£ 

44 in the no CHD group were diagnosed correctly) . 

The Bayesian diagnostic model had a high degree o£ 
accuracy in correct diagnoses. It is easily implemented 
and appears to be well adapted to screening studies where 
a large population is involved. The model continuously 
updates the available patient in£ormation £rom which the 
conditional probabilities are calculated and may be use£ul 
in indicating trends or £luctuations in the indicators o£ 
disease . 



29 



VIII. AREAS FOR FUTURE STUDY 



As pointed out previously (see Section IV. A) , one of 
the main advantages of the approach followed in the paper is 
the easy expansion of the number of variables and the number 
of patients to be tested. This implies that as the number 
of variables is increased, the diagnosis of CHD vi^ill improve. 
The expanded list of variables could also be used to pre- 
dict other diseases. Instead of a space of CHD and no 
CHD, there is a space of CHD plus other diseases limited 
only by logical considerations such as the time, money, 
availability of computational equipment, etc. The integra- 
tion of this expanded prediction model into routine physical 
examinations and patient history could allow preliminary 
diagnosis prior to consultations with doctors, helping to 
reduce costs and the increasing patient load of doctors. 

As presently modeled, diagnosis is based on results of 
samples from diseased and non-diseased groups. However, 
as more samples are obtained and a history of the patient's 
variables (i.e., changes in blood pressure over several 
years) is made, the model could be modified to diagnose on 
the basis of change in a patient's variables rather than by 
comparison with a norm. This would improve diagnosis among 
persons suffering from one disease where the diagnosis is 
being complicated by the existence of another disease. 

The extension of the model to include the diagnosis of 
women would require only a change in the prior probability 



30 



to include a test for sex. Additionally, a statistical 
check of the indicators would be necessary to determine if 
a new data base including women would be necessary if 
women were to be tested. 

Once a person has been found to have CHD, a system to 
monitor his progress under dieting and exercise control 
could be developed from the present model. This could 
allow a technician rather than a doctor to periodically 
check the patient's indicators. 

The definitions used for positive tests throughout 
this study were based on current information. Botli a 
statistical and medical investigation in this area to 
better define test results could greatly improve future 
models developed on the same principles. 

A model to predict the cost of implementing and 
operating the proposed diagnostic model should be explored 



31 



APPENDIX A: SAMPLE DATA COLLECTION SHEET 



Name 

Date 



RACE: CAU NEC MON 

Sex Height Blood Pressure 

Age Vveight Blood Type 

Family History: Any of the following diagnosed heart diseases 

(circle) 

Father Uncle 

Mother Brother Unknown None 

Aunt Sister 

Any of the following died of heart disease (circle) 

Father Uncle 

Mother Brother Unknown None 

Aunt Sister 

Cigarette smoking in excess of one year? Yes No 

If yes: less than 1/2 pack per day 

one pack per day 
more than one pack per day 

History of Ischemic episodes: 

Chest pain, undetermined origin 

Typical angina 

None 

Resting EKG: 

Normal 

ST-T abnormalities 
Pathologic Q waves 
Other 

Exercise EKG: 

Neg 

ST depression greater than 1 mm 
ST depression greater than 2 mm 
ST elevation 

Triglycerides 

Cholesterol 

Max Heart Rate Attained during Exercise Test 



32 



APPENDIX B 

BAYES’ DIAGNOSTIC MODEL, FORTRAN FLOW CHART 




A 



disease for each 



Print the 
Patient ’ s 
Results 




33 



APPENDIX C; BAYES' DIAGNOSTIC MODEL FORTRAN PROGRAM LISTING 



BAYE'S DIAGNOSTIC MODEL FOR CORONARY HEART DISEASE 



REAL=i8B,C, D,E ,F,G, S , T , U , V , BC ,CD , DE , EF , EG , G S , ST , TU , UV , 
1W,XY,YZ,ZY,YX, ZF,ZW, ZM,ZB, ZH,ZO,ZG,ZN,ZC.ZD,VW,A,ZA 



READ IN TITLES FOR OUTPUT 



RFAD(5,99) A,B,C,C, E,F,Gf S,T,U,V,W,BC,CD,DE,EF,FGi GS, 
1ST,TU,UV,VW,XY, YZ,ZY,YX»ZF ,ZW,ZM,ZA,ZB,ZH, ZC,ZG,ZN, ZC, 
1 ZD 

99 FORMAT (L0A8) 

READ IN PROBABILITIES OF SYMPTOMS GIVEN NO CHD 

CODE FOR INPUT OF PROBABILITIES OF SYMPTOMS GIVEN 
NO CHD (AND CHD) 

1ST LETTER 

P-PR03ABI LI TY OF 
2ND AND 3RD LETTER 

BD-DIASTCLIC PRESSURE 
BS- SYSTOLIC PRESSURE 
BT-BLOOC TYPE 

ci-cigarette habits 

CH-CHOLESTEROL 
EE-EXERCISE FKG 
FN-FAMI LY HI STORY 
HE-HISTORY OF ISCHEMIA 
RE-RESTING EKG 
RN— RACc NEGRO 
RC-RACE CAUCASIAN 
RM-RACE MONGOLIAN 
TY-TRIGLYCERI DE 

4TH LETTER IN A FOUR LETTER CODE 
D-CHD 
N-NO CHD 

4TH LETTER IN A FIVE LETTER CODE 

A-ABNORMAL WHEN PRECEDED BY RE, TY, CH, BS, OR BD 
A-ANGINA WHEN PRECEDED BY HE 
A-BLOGD TYPE A WHEN PRECEDED BY 3T 

G-GREATER THAN IMM DEPRESSION WHEN PRECEDED BY EE 
G-GREATER THAN 1 WHEN PRECEDED BY Cl 
H-1/2 PACK 
N-NEGATIVE OR NCNE 

O-BLOOD TYPE 0, A8 OR B WHEN PRECEDED BY BT 
0-1 PACK WHEN PRECEDED BY Cl 
O-OTHER WHEN PRECEDED BY HE 
P-PAIN UND. ORIGIN WHEN PRECEDED BY HE 
. P-POSITIVE 
0-PATH, 0 WAVES 
5TH LETTER 
D-CHD 
N-NO CHD 

READ(5, 100 ) PRNN, PRCN,PRMN,PBTAN,PBTON, PFNPN,PFNNN , 
IPCIHN, PC 10N,PC IGN, PCI NN, PHEPN, PHEAN, PHENN, PRENN, PREAN, 
IPREON, PREON, PEENN, PEEON,P£EGN, PTYNN, PTYAN, PCHAN,PCHNN, 
1PBSAN,PBSNN,PBDAN, P3DNN,PDA 

100 FORMAT! 8F10. 6) 

READ IN PROBABILITIES OF SYMPTOMS GIVEN CHD 

RFAD(5, 101 )PRND,PRCD,PRMD,PBTAD,PBTDD,PFNPC,PFNND, 

1 PC I HD, PC 10 D, PC I GD, PC IN D , PHEP C, P HEA D, P HEN D , PREND,PREAD, 
1PRE0D,PRECD ,PEEND, PEEOD, PE EGD , PTYND, PTYAD, PCHAD, PCHND, 
1 PBSAD, PBSND, PBDAD, P BOND 

101 FORMAT (8 FI 0.6) 

1 = 0.0 



34 



READ IN PATIENT'S TEST RESULTS 

CODE FOR INPUT OF PATIENT'S TEST RESULTS 
INDIC- NOT USED COL . 1 
R-RACE COL. 6 

1- NEGRO 

2- CAUCASI AN 

3- MONGO LI AN 

Z- NOT USED COLS. 9-11 

BS-SYSTOLIC PRESSURE COLS. 14-16 NUMERICAL VALUE 
BD-DIASTOLIC PRESSURE COLS. 19-21 NUMERICAL VALUE 
BT-BLOOC TYPE COL. 24 

1- A 

2 - 0 

3- B 

4- AB 

FN-FAMILY HISTORY COL. 27 

1- POSIT IVE 

2- NEGAT IVE 

CI-CIGARETTE HABITS COL. 30 

1- 1/2 PACK 

2- ONE PACK 

3- GREATER THAN ONE 

4- NONE 

H I E-HI STORY OF ISCHEMIA COL. 33 

1- PAIN OF UNDETERMINED ORIGIN 

2- ANGINA 

3- NONE 

RE-RESTING EKG COL. 36 

1- NCRMAL 

2- A3NCRMAL S-T SEGMENT 

3- 0- WAVES 

4- OTHER 

EE-EXERCISE EKG COL. 39 

1- NORMAL 

2- LESS THAN 1/2 MM 

3- 1/2 TO 1 MM DEPRESSION 

4- GREATER THAN IMM DEPRESSION OR A ST ELEVATION 
TRY-TRIGLYCERIDE COL. 42 

1- NORMAL 

2- ABNCRMAL 

CHL- CHOLESTEROL COLS. 45-47 NUMERICAL VALUE 
AGE-AGE COLS. 50-51 NUMERICAL VALUE 

50 READ! 5, 103) INDIC ,R , Z,BS»BD,BT,FNtCI ,H1 E ,RE ,EE ,TRY ,CHL, 
1 AGE 

103 FORMAT (A1,4X. FI .0,2 X,F3 .0,2X,F3.0 ,2X,F3.0, 2X, Fl.O, 2X, 
1F1.0.2X,F1.0,2X,F1.0,2X,F1.0,2X,F1.0,2X,F1.0,2X,F3.0, 
12X.F2.0) 

ZERO COUNT OF MISSING TESTS 

MAGE=0 
MR=0 
MBT=0 
MFN = 0 
MCI=0 
MhE=0 
MRE=0 
MEE = 0 
MTY=0 
MCL=0 
MBS=0 
HB0=0 
J = 0 
1 = 1 + 1 

CHECK FOR LAST CARD 
IF(R.E0.9)G0 TO 5000 



35 



CHECK FOR AGE 

IF { AGE. EO.O .0 ) GO TO 200 

MAKE FIRST CHECK FOR AGE GROUP, LESS THAN 34, 34 TO 
44, OVER 44 

AFTER DETERMINATION ASSIGN PRIOR PROBABILITY OF CHD 

IF( AGE.GT.34)G0 TO 120 

PD=.01 

GO TO 190 

120 IF( AGE.GT.44)G0 TO 130 
P0=.04 
GO TO 190 
130 PC=.07 

190 K'RITF ( 6, 191) I 

191 FORMAT! 'O' ,/,23X, ’SUBJECT #’,2X,I3) 

GO TO 201 

200 PD=PDA 
MAGE=MAGE+1 

CHECK TO DETERMINE RACE, RECALCULATE PROBABILITY OF 
CHD 

201 IFIR.EO.O. 0)GO TO 301 

IFIR.EO. 1.0 )PDR= (PRCD^PD )/( { PRCD=»=P D ) -K PRCN=«' ( 1.0-PD) ) ) 

I F (R. E0.2. 0) PDR= (PRND-^PD)/ ( ( PRND i*P D ) + ( P RNN^' ( 1.0-PD) )) 
IF(R.E0.3.0)PDR=(PRMD*PD) /( ( PRMDY PD ) + ( PR MN* ( 1 . 0-PD ) ) ) 
GO TO 302 

301 PDR=PD 
MR=MR+1 

CHECK TO DETERMINE BLOOD TYPE, RECALCULATE PROBABILITY 
CHD 

302 IF (BT. E0,0.0)G0 TO 401 
IF(BT.E0.1.0)P03T=( PSTAD'-VP DR ) / ( (PBTAD^PDR) + 

1 (PBTAN’.'' ( 1 .0-PDR ) ) ) 

IF (BT.E0.2. 0)PD3T = { P3T0D4PDR) / ( ( P BTOD=?=? DR ) + 

1(P3T0N*( 1. 0-PDR) ) ) 

IF(BT.E0.3.0 )PDBT=( PBTOD^PDR)/( ( P BTOD-tPDR ) + 

1 ( PBTON*( 1. 0-PDR) ) ) 

IF(BT.E0.4.0)PDBT=( PBTOD=^PDR)/( (PBTOD^PDR) + 

1 ( PBTON^ ( 1. 0-PDR) ) ) 

GO TO 402 

401 PDBT=PDR 
M3T=MBT+1 

CHECK TO DETERMINE FAMILY HISTORY, RECALCULATE PROB- 
ABILITY OF CHD 

402 IF{FN.EO.O.O)GO TO 501 
IF(FN.EQ.1.0)PDFN=( F FNP D=j^P DBT ) / ( ( P FNP D^P DBT ) + 

1( PFNPN*( 1. 0-P06T) J ) 

IF(FN.EQ.2.0)PDFN={ PFNN D-'-^P DBT ) / ( ( P FNND=^=PD8 T ) + 

1 ( PFNMN- ( 1. 0-PDBT) ) ) 

GO TO 502 

501 PDFN=PD3T 
M.FN=MFN + 1 

CHECK TO DETERMINE CIGARETTE HAB I TS , RECALC ULATE PROB- 
ABILITY OF CHD 

502 IF(CI ,EO.O.O)GO TO 601 
IF(CI.E0.1.0)PDCI = ( PC IHD:‘'PDFN)/ ( ( PC I HD=«'PDFN ) + 

1 { PCIHN^ ( 1. 0-PDFN) ) ) 

IF( C I . EO .2 .0)PDCI = ( PCIOD^PDFN) /{ ( PCIOD^PDFN) + 

1 ( PCION=^ (1. 0-PDFN) ) ) 

IF(CI.E0.3.0)PDCI = ( PCIGD*PDFN) / I ( PCIGD»PDFN) + 

1 ( PCIGN-:- ( 1 ,0-PDFN ) ) ) 

IF (Cl. EQ. 4.0 )P DC I = ( PCIND=^^PDFN)/ ( ( PCI ND^PDFN) + 

1 (PCINN=^( 1. 0-PDFN) ) ) 



36 



GO TO 602 

601 PDCI=PDFN 
MCI=MCI+1 

CHECK TO DETERMINE HISTORY OF ISCHEMIC EP I SODES , R EC AL- 
CULATE PROBABILITY OF CHD 

602 IF(HIE.E0.3.0)G0 TO 701 

IFIHIE.EO. 1.0) PDHE = ( PHEPD-'-PDCI ) / ( ( PHEPD^^^PDCl ) + 

1 (PHEPN=‘’-(1.0-PDCI ) ) ) 

IF (HI E. EQ.2.0) PDHE = ( PHEAD^PDCI ) / ( ( PH E AD* P DC I ) + 

1 (PHEAM*( 1.0-PDCI ) ) ) 

IF(H1E.EC.3.0)PDHE= (PHEND*PDC1 }/( (PHEND*PDCI )+ 

1( PHENN*{ 1. 0-PDCI ) ) ) 

GO TO 702 

701 PDHE=PDCI 
MHE=MHE+1 

CHECK TO DETERMINE RESTING EKG RESULTS, RECALCULATE 
probability of CHD 

702 IFIRE. EO. 0. 0 )G0 TO 801 
IF(RE.E0.1.0)PDRE=(PREND*PDHE)/ ( ( P REND* PDH £ ) + 

1 (PRENN-M 1.0-PDHE ) ) ) 

IFIRE.E0.2 .0 )PDRE= ( PRE AD*P DHE ) / ( ( PR EAD*P DH E ) + 

1 ( PREAN-M 1. 0-PDHE ) ) J 

IFIRE.EQ ,3.0)PDRE= ( PRE 0D*P DHE ) / ( { P REQD* PDH E ) + 

1 (PRE0N=‘^ (1.0-PDHE) ) ) 

IF(RE.E0.4.0)PDRE=( PREOD* P DH E ) / ( ( PREOD*PDHE)+ 

1 (PREON’^- (1.0-PDHE ) ) ) 

GO TO 802 
801 PDRE=PDHE 
MRF=MRE+1 



CHECK TO DETERMINE EXERCISE EKG RESULTS, RECALCULATE 
PROBABILITY OF CHD 

802 IF(EE.EO.O.O)GO TO 901 

1 F( EE.EO.l .0 )PDEE=( PEEND*PDRE) /( ( PEEND*PDRE ) + 

1 (PEENN* (1.0-PDRE) ) ) 

IF(EE.E0.2.0)PDEE=( PEEOD^PDRE) /( ( PEEOD*PDR£)+ 

1 (PE EON* (1.0-PDRE ) ) ) 

IF(EE.GE.3.0)PCEE = ( PEEGD*PDRE)/ ( ( P EEGD*PDR E ) + 

1 (PEEGN*( 1.0-PDRE ) ) ) 

GO TO 902 

901 PDEE=PDRE 
MEE=MEE+1 

CHECK TO DETERMINE TRIGLYCERIDES RESULTS, RECALCULATE 
PROBABILITY OF CHD 

902 IF (TRY. e'o. 0.0) GO TO 925 

IFITRY.EO. 1.0)PDTY=(PTYND*PDEE) /( ( PTYND* PDEE ) + 

1 (PTYNN* ( 1 .0-PDEE ) ) ) 

IFCTRY.GE.2.0) PDTY=( PTYAD*PDEE)/ ( ( PTY AD* P D E E ) + 

1 (OTYAN*( 1. 0-PDEE ) ) ) 

GO TO 926 

925 PDTY=PDEE 
MTY=MTY+1 

CHECK TO DETERMINE CHOLESTEROL RESULTS, RECALCULATE 
PROBABILITY OF CHO 

926 IF (CKt. EO.0.0) GO TO 950 
1=( AGE. GT. 29.0 )G0 TO 930 

I F(CHL .GT.240)PDCL=(PCHAD*PDTY)/ ( ( PCHAD*P DTY ) + 

1 (PCHAN*( 1. 0-PDTY) ) ) 

IF(CHL .LE.2A0 )PDCL= (PCHND-PDTY) /( ( PCHND*PDTY) + 

1 (PCHNN*(1. 0-PDTY) ) ) 

GO TO 951 

930 IF( AGE. GT. 39.0 )G0 TO 935 

IF(CHL.GT.270) P DCL = ( PC HAD>^ PDTY ) / ( ( PCHAD*PDTY) + 



37 



1 (PCHAN>"'( 1. 

IF(CHL.LE. 
1 (PCHNN’^' ( 1. 
GO TO 951 
935 IFCAGE.GT, 
IF(CHL.GT, 
1 (PCHAN*( 1. 

IF(CHL.LE, 
1( PCHNN*( 1. 
GO TO 951 
945 IF(CHL.GT, 
1 (PCHAK^( 1. 

IFCCHL.LE. 
KPCHNN^^ (1. 
GO TO 951 
950 PDCL=PDTY 
MCL=MCL+1 



C-PDTY) ) ) 

270)PDCL=(PCHND*PDTY)/( ( PCHND^'PD TY ) + 
D-PDTY) ) ) 

49 .0 )G0 TO 945 

310) PDCL={PCHAD*PDTY)/ ( ( PCHAD^PDTY )+ 
O-POTY) ) ) 

310 ) PDCL= (PCHND^PDTY )/ ( ( PCHND*PDTY)+ 
0-PDTY) ) ) 

330) PDCL= {PCHAD^PDTY )/ ( ( PCHAD^P DT Y ) + 
O-POTY) ) ) 

330)PDCL={PCHND’!'PDTY )/{ { PCHND=^PDTY) + 
0-PDTY) ) ) 



CHECK TO DETERMINE SYSTOLIC 
LATE PROBABILITY OF CHD 



BLOOD PRESSURE, RECALCU- 



951 IFIBS.EO.O.OGO TO 975 

I F( BS.GE.140)PDBS=( PBSAD^PDCL) / ( ( P BS AD* P DC L ) + 
1 (PBSAN*( 1. 0-PDCL ) ) ) 

IFIBS.lt .140 )P DBS- ( PBSND*PDCL) / ( { P BSND*P DCL ) + 
1(PBSNN*( 1. 0-PDCL) ) ) 

GO TO 976 
975 PDBS=PDCL 
HBS=MBS+1 

CHECK TO DETERMINE DIASTOLIC BLOOD PRESSURE, 
LATE PROBABILITY OF CHD 



RECALCU- 



976 IF( BD. EO.O .0 )G0 TO 990 

IF(BD.GT.90)PDBD=(PBDAD*PDBS)/ ( ( P BD AD* P DBS ) + 
1 (PBDAN*( 1.0-PDBS) ) ) 

IF( BD.LE.90 )P03D={ POOND*PDBS)/UPBDND*PDBS ) + 
1 ( PBDNN*{ 1. 0-PDBS) ) ) 

GO TO 991 
990 PD3D=PDBS 
MBD=MBD+1 



FORMAT OF OUTPUT INSTRUCTIONS 



991 

986 



WRITE(6,936) I 
FORMAT (‘O*. 3X, 
1* ON SUBJECT n* 





IF(MAGE.EO.O)GO TO 
WRITE(6,992)VW ■ 


993 


992 


FORMAT! 26X.A8) 
GO TO 994 




993 


J-J+1 




994 


IF( MR,EO.O)GO to 996 
WRITE (6,992 ) A 
GO TO 997 


996 


J = J + 1 




997 


IF(MBT .EO.O )GG TO 
WRITE(6,995)B,C 


998 


995 


F0RMAT(26X,A8,A8) 
GO TO 999 




998 


J = J+1 




999 


IF{MFN.EQ.O)GO TO 
WRITE(6 ,995 )D, E 
GO TO 1001 


1000 


1000 


J = J + 1 




1001 


IF(MCI .EQ.O)GO TO 
WRITE! 6,995)F,G 
GO TO 1003 


1002 


1002 


J=J+1 




1003 


IF(MHE .EO.O)GO TO 
WRITE(6,995)S, T 
GO TO 1005 


1C04 



•THE FOLLOWING 
,2X,I3) 



INFORMATION IS MISSING' 



38 




n 



3 








m 



• P' 




1004 


J = J + 1 






1005 


IF(MRE.E0.0)G0 


TO 


1006 




WRITE! 6,995) U, 


V 






GO TO 1007 






1006 


J = J + 1 






10C7 


IF(MEE.EO.O)GO 


TO 


1008 




WRITE(6,995)W, 


BC 






GO TO 1009 






1008 


J = J + 1 






1009 


IF(MTY.EO.O)GC 


TO 


1010 




WRITE(6,995)CD 


,CE 






GO TO 1011 






1010 


J = J + 1 






1011 


IF( MCL. EO. 0)GO 


TO 


1012 




WRITE(6,995)EF 


,FG 






GO TO 1013 






1012 


J = J+1 






1013 


IF(MBS.EO.O)GO 


TO 


1014 




WRITE(6,995)GS 


, ST 






GO TO 1015 






1014 


J = J + 1 






1015 


IF(MBO.EO. 0)G0 


TO 


1016 



WRITE(6,995)TU,UV 
GO TO 1017 

1016 J=J+1 

1017 IF( J.LT*12 )G0 TO 1020 
WRITE(6,1018) 

1018 FORMAT( ' O' ,28X, 'NONE' ) 

1020 WR1TE(6,1021 ) I,PDBC 

1021 FORMAT( • O' ,4X, ' SUB JECT , IX , I 3 , IX , ' HAS PROBABILITY', 
12X,F8.6, 2X, 'OF CORONARY'/' ',5X, 'HEART DISEASE GIVEN', 
1' RESULTS IN THE FOLLOWING TESTS') 

IFIMAGE.EO. DGO TO 1024 
WRITE(6,1022)VW,AGE 

1022 FORMAT (18X ,A8 ,13X, F3.0 ) 

1024 IFIMR. EO.l )G0 TO 1026- 

IF (R. EO.l .0)WR ITE(6', 10 23 )A, ZW 
IF (R.E 0.2. 0) WRITE (6,1023 ) A, ZF 
IF(R.E0.3.0)WRITE( 6,1023)A,ZM 

1025 F0RMAT(18X,A8,13X, A8) 

1026 IF(MBT.EQ.1)G0 TO 1029 
IF(BT.E0.1.0)WRITE(6, 1028)B,C,ZA 
IF (BT.GT.l .OJWRITE (6, 1028) B,C, ZB 

1028 FORMAT! 18X,A8,A8,5X,A8) 

1029 TF(MFN.50.1)G0 TO 1C30 

I F ( F N . EO . 1 . 0 ) W R I T E ( 6 , 1 028 ) D , E , ZY 
IF ( FN.EO. 2.0) WRITE ( 6 , 1 028 ) D , E , YX 

1030 IF(MCI .EO.l )G0 TO 1031 
IF(CI.E0.1.0)WRITE(6,1028)F,G,ZH 

■ IF(CI.E0.2.0)WRITE(6, 1028)F,G,ZO 
IF(CI.E0.3.0)WRIT£(6,1028) F,G, ZG 
IF(C I.E0.4.0)WRI TE (6, 1028) F,G, ZN 

1031 IF(MHE.E0.1)GC TO 1032 
IF(HIE.EO.l.O) WRITE(6, 1028 )S,T,ZC 
lF(HIE.50.2.0)WRITE(6,i028)S,T,Z0 
IF(HIE.EQ.3.0)WRITE(6, 1028)S,T,ZN 

1032 IF(MRE. EQ. DGO TO 1033 
IF(RE.E0.1.0)WRITE( 6,1028)U,V,XY 
IFIRF.GE.2.0 )WRITE ( 6 , 1 028 ) U , V, Y Z 

1033 IF(MEE.E0.1)G0 TO 1034 

IF( EE.EO.l .0 )WRITE ( 6, 1028) W,BC , XY 
IF (EE.GE. 2.0) WRITE (6,1 028 ) W, BC,YZ 

1034 IF( MTY. EO. DGO TO 1035 

IF (TRY. EQ. 1.0) WRITE (6, 1 028 ) CD , DE , XY 
IF (TRY. GE. 2.0) WR I T E (6 , 1 028 ) C D , DE , Y Z 

1035 IF(MCL.EO. DGO TO 1037 
WRITE (6, 1036 )EF,FG,CHL 

1036 FORMAT! 18X,A8,A8,5X,F4.0) 

1037 IF (MBS .EO. 1 ) GO TO 1038 
WRITE(6,1036)GS,ST,3S 

1038 IF(MBO.EO. DGO TO 50 
WRITE(6, 1036)TU,UV, BD 



39 



5000 



GQ TO 

STOP 

END 



50 



40 



APPENDIX D 



SAMPLE OUTPUT 



SUBJECT # 7 

THE FOLLOWING INFORMATION IS MISSING ON SUBJECT it 

RACE 

BLOOD TYPE 
RESTING EKG 
TRIGLYCERIDE 
CHOLESTEROL 

SUBJECT ft 7 HAS PROBABILITY 0.979999 OF CORONARY 
HEART DISEASE GIVEN RESULTS IN THE FOLLOWING TESTS 



AG E 

FAMILY HISTORY 
SMOKING HABITS 
ISCHEMIA HISTORY 
EXERCISE EKG 
SYSTOLIC PRESS. 
DIASTOLIC PRESS. 



44. 

NEGATI VE 

NONE 

ANGINA 

ABNORMAL 

150. 

100 . 



SUBJECT ft 8 

THE FOLLOWING INFORMATION IS MISSING ON SUBJECT ft I 

NONE 

SUBJECT U 8 HAS PROBABILITY 0.999946 OF CORONARY 
HEART DISEASE GIVEN RESULTS IN THE FOLLOWING TESTS 



AGE 

RACE 

BLOOD TYPE 
FAMILY HISTORY 
SMOKING HABITS 
ISCHEMIA HISTORY 
RESTING EKG 
EXERCISE EKG 
TRIGLYCERIDE 
CHOLESTEROL 
SYSTOL IC PRESS . 
DIASTOLIC PRESS. 



28. 

WHITE 

A 

NEGATIVE 

OVER ONE 

ANGINA 

ABNORMAL 

ABNORMAL 

ABNORMAL 

315. 

118. 

76. 



41 



LIST OF REFERENCES 



1. Blackburn, H., Measurement in Exercise Electrocardiography, 

Thomas, 1969. 

2. Brodman, K. , "Diagnostic Decisions by Machine," IRE 

' Transactions on Medical Electronics , July 1960. 

3. Chan, L. S. and Dunn, 0. J., "The Treatment of Missing 

Values in Discriminant Analysis - 1. The Sampling 
Experiment," Journal of the American Statistical 
Association , V. 67 , n. 338 , June 1972. 

4. Cohn, P. F. , and others, "A Quantitative Clinical 

Index for the Diagnosis of Symptomatic Coronary- 
Artery Disease," The New England Journal of Medicine , 

V. 286, n. 17, 27 April 1972. 

5. Dunn, 0. J. , "Some Expected Values for Probabilities of 

Correct Classification in Discriminant Analysis," 
Technometrics , v. 13, n. 2, May 1971. 

6. Forsyth, J. W. , A Manual of Exercise Testing , Medford 

Clinic, Medford, Oregon, January 1972. 

7. Gessaman, M. P. and Gessaman, P. H. , "A Comparison of 

Some Multivariate Discrimination Procedures," Journal 
of the American Statistical Association , v. 67, n. 338, 
June 1972. 

8. Gustafson, D. H., and others, "Initial Evaluation of a 

Subjective Bayesian Diagnostic System," Health 
Services Research , Fall 1971. 

9. Kingsbury, K. J., "Relation of ABO Blood-Groups to 

Atherosclerosis," The Lancet , 30 January 1971. 

10. Lusted, L. B., Introduction to Medical Decision Making, 

Thomas, 1959. 

11. Nie, N. , Beut, D. H. , and Hull, C. H. , Statistical 

Package for the Social Sciences , McGraw-Hill, 1970. 

12. Page, I. H., and others, "Prediction of Coronary Heart 

Disease Based on Clinical Suspicion, Age, Total 
Cholesterol, and Triglyceride," Circulation, v. XLII, 
October 1970 . 

13. Rao , C. R. , "Recent Advances in Discriminatory Analysis," 

Indian Society of Agricultural Statistics Journal, v. 21, 
1969. 



42 



14. Stamler, J. , and others, "Primary Prevention o£ the 

Atherosclerotic Diseases," Circulation , v. XLII, 
December 1970. 

15. Stanford University Department of Statistics Report 10, 

Asymptotic Evaluation of the Probabilities of 
Mesclassif ication by Linear Discriminant functions , 
by T . W. Anderson, 28 September 1972. 

16. "The Heart Attack Epidemic," Life Extension, v. 3, 

n. 7, 1972. 



43 



INITIAL DISTRIBUTION LIST 



No. Copies 

1. Defense Documentation Center 2 

Cameron Station 

Alexandria, Virginia 22314 

2. Library, Code 0212 2 

Naval Postgraduate School 

Monterey, California 93940 

3. Chief of Naval Personnel 1 

Pers lib 

Department of the Navy 
Washington, D. C. 20370 

4. Naval Postgraduate School 1 

Department of Operations Research 

and Administrative Sciences 
Monterey, California 93940 

5. Asst Professor Marlin U. Thomas, Code 55 To 1 

Department of Operations Research 

Naval Postgraduate School 
Monterey, California 93940 

6. Asst Professor Edward A. Brill, Code 55 Zg 1 

Department of Operations Research 

Naval Postgraduate School 
Monterey, California 93940 

7. Asst Professor • William C. Giauque , Code 55 Gi 1 
Department of Operations Research 

Naval Postgraduate School 
Monterey, California 93940 

8. The Surgeon General 1 

Department of the Army 

Washington, D. C. 20314 

9. Commanding Officer 1 

Silas B. Hays Army Hospital 

Fort Ord, California 93941 

10. Chief of Cardiology 1 

Walter Reed General Hospital 
Washington, D. C. 20012 



44 



1 



11. Major James A. Fischer, M.D. 

Cardiology Clinic 
Silas B. Hays Army Hospital 
Fort Ord, California 93941 

12. Captain William R. Condos, Jr. 2 

Department of Mathematics 

U.S. Military Academy 
West Point, New York 10996 

13. Captain Everett W. Knox 2 

190 Glascoe Avenue 

Staten Island, New York 10314 



45 



DOCUMENT CONTROL DATA • R 8. D 

. ■, Cion 0(11,1, o( obstroc, rnd indc.lnj onnoloOon n.u.t b » .n.ercd ,h, ovoroll repo,. I, 

(Securtty ctitsstltc»tton ot tni e, tyoay ot r E PO RT SE C U Rl T y classification 



ORICInatinC activity (Corpormte suthor) 

Naval Postgraduate School 
Monterey, California 93940 



2 <*. REPORT SECURI TY CLASSIFICATION 

Unclassified 



2b. GROUP 



report TITLE 



A Bayesian Approach to Assist in the Diagnosis of Coronary 
Heart Disease 

descriptive notes (Type of report ^ndjnclusive dates) 

Master ^s Thesis; March 197_5 

authoRIS) (First nsme, middi* fnitimi, n«m*) 

William Randolph Condos, Jr. and Everett William Knox 



REPOR T DATE 



?•. TOTAL NO. OF PAGES 



March 1973 



“contract or grant no. 



b. project no. 



J2 



7b. NO. OF REFS 
16 



»a. ORIGINATOR'S REPORT NUMBERISJ 



9b. OTHER REPORT NO<S» (Any Other numbmrt thmt may be ^MMlCned 
thia raport) 



0. DISTRIBUTION STATEMENT 



Approved for public release; distribution unlimited, 



II. SUPPLEMENT AR Y NOTES 



’l2. SPONSORING MILITARY ACTIVITY 

Naval Postgraduate School 
Monterey, California 93940 



"'’'"ihe objectives of this thesis were to design a method for 
valultior^f the diagnostic potential of indicators of 

oronary heart disease (CHD) and to present a Se^ts 

rocedure for aiding in its diagnosis. A 

as divided into two mutually ^elusive groups, ^dut? or 

ranhic- evidence of CHD, and those with no CHD. Active duty or 
e?ired military men between the ages of 30 and 67 years constituted 
he sample spacL Tests and risk factors were available in the medical 
iteratSre tLt a doctor could view as an indicator 
f CHD. A vector of these possible indicators was 
iseased group was compared to the non-diseased group f 
evaluate the diagnostic potential of the indicators. . £ 

ly discriminant analysis in conjunction with a ° ^ere 

reighting the importance of test results. The important 
;hen used to forLlate a model for diagnosing CHD based on a Bayes 

lecision technique. 



DD,r“..1473 



FORM 
NOV es 

S/N 0101 -807-681 t 



UNCLASSIFIED 



46 



^curity CU»*ification 



A-31408 



I » 






UNCLASSIFIED 

Security ClBUsifirgtion 



key wo ROf 



Bayes 

Quantitative Analysis 
Discriminant Analysis 
Coronary Heart Disease 
Medical Decision Making 



DD ,”“.1473 

S/N 0101 *807-6821 




UNCLASSIFIED 



Security CU»«ific»tion 



47 




Thes i s i4d274 

C 669 Condos 

c.l A Bayesian approach to 

assist in the diagnosis 
of coronary heart dis- 
ease. 



